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Description 

This invention relates to the field of data processing. More particularly, this invention relates to data processing 
utilizing multiple sets of program instruction words. 
5 Data processing systems utilize a processor core operating under control of program instruction words, which 

when decoded serve to generate control signals to control the different elements within the processor core to perform 
the necessary functions to achieve the processing specified in the program instruction word. 

A typical processor core will have data pathways of a given bit width that limit the length of the data words that 
can be manipulated in response to a given instruction. The trend in the field of data processing has been for a steady 
10 increase in these data pathway widths, e.g. a gradual move from 8-bit architectures to 16-bit, 32-bit and 64-bit archi- 
tectures. At the same time as this increase in data pathway width, the instruction sets have increased in the number 
of instructions possible (in both the CISC and RISC philosophies) and the bit length of those instructions. As an example, 
there has been a move from the use of 16-bit architectures with 16-bit instruction sets to the use of 32-bit architectures 
with 32-bit instruction sets. 

is A problem with migration towards increased architecture widths is the desire to maintain backward compatibility 

with program software written for preceding generations of machines. One way of addressing this has been to provide 
the new system with a compatibility mode. For example, the VAX11 computers of Digital Equipment Corporation have 
a compatibility mode that enables them to decode the instructions for the earlier PDP11 computers. Whilst this allows 
the earlier program software to be used, such use is not taking full advantage of the increased capabilities of the new 

20 processing system upon which it is running, e.g. perhaps only multiple stage 16-bit arithmetic is being used when the 
system in fact has the hardware to support 32-bit arithmetic. 

Another problem associated with such changes in architecture width is that the size of computer programs using 
the new increased bit width instruction sets tends to increase (a 32-bit program instruction word occupies twice the 
storage space of a 16-bit program instruction word). Whilst this increase in size is to some extent offset by a single 

25 instruction being made to specify an operation that might previously have needed more than one of the shorter instruc- 
tions, the tend is still for increased program size. 

An approach to dealing with this problem is to allow a user to effectively specify their own instruction set. The 
IBM370 computers made by International Business Machines Corporation incorporate a writable control store using 
which a user may set up their own individual instruction set mapping instruction program words to desired actions by 

30 the different portions of the processor core. Whilst this approach gives good flexibility, it is difficult to produces high 
speed operation and the writable control store occupies a disadvantageous^ large area of an integrated circuit. Fur- 
thermore, the design of an efficient bespoke instruction set is a burdensome task for a user. 

European Published Patent Application E-A-0 169 565 discloses a system with a 6-bit processor core. In a first 
mode the system operates on 16-bit instructions that specify 16-bit data processing operations. In a second mode the 

35 system operates on 8-bit instructions that specify 8-bit data processing operations. An 8-bit instruction is mapped using 
a programmed memory to a 16-bit instruction which is then decoded. This decoded 16-bit instruction then performs 
the specified 8-bit data processing operation. 

It is also known to provide systems in which a single instruction set has program instruction words of differing 
lengths. An example of this approach is the 6502 microprocessor produced by MOS Technology. This processor uses 

40 8-bit operation codes that are followed by a variable number of operand bytes. The operation code has first to be 
decoded before the operands can be identified and the instruction effected. This requires multiple memory fetches and 
represents a significant constraint on system performance compared with program instructions words (i.e. operation 
code and any operands) of a constant known length. 

IBM Technical Disclosure Bulletin, Vol. 15, No. 3. August 1972, page 920, discloses a system which converts 17-bit 

45 instructions to 20-bit instructions. 

European Published Patent Application EP-A-0-199 173 discloses a microprocessor including an instruction de- 
coder. 

Viewed from one aspect the invention provides apparatus for processing data, said apparatus comprising: 

50 a processor core having N-bit data pathways and being responsive to a plurality of core control signals; 

first decoding means for decoding X-bit program instruction words, specifying N-bit data processings operations, 
from a first permanent instruction set to generate said core control signals to trigger processing utilizing said N- 
bit data pathways; 

second decoding means for decoding Y-bit program instruction words from a second permanent instruction set to 
55 generate said core control signals to trigger processing, Y being less than X; and 

an instruction set switch for selecting either a first processing mode using said first decoding means upon received 
program instruction words or a second processing mode using said second decoding means upon received pro- 
gram instruction words; characterised in that 
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said Y-bit program instruction words specify N-bit data processing operations utilizing said N-bit data pathways. 

The invention recognises that in a system having a wide standard X-bit instruction set and N-bit data pathways 
(e.g. a 32-bit instruction set operating on 32-bit data pathways), the full capabilities of the X-bit instruction set are often 
5 not used in normal programming. An example of this would be a 32-bit branch instruction. This branch instruction might 
have a 32 megabyte range that would only very occasionally be used. Thus, in most cases the branch would only be 
for a few instructions and most of the bits within the 32 -bit instruction would be carrying no information. Many programs 
written using the 32 -bit instruction set would have a low code density and utilize more program storage space than 
necessary. 

10 The invention addresses this problem by providing a separate permanent Y-bit instruction set, where Y is less than 
X, that still operates on the full N-bit data pathways. Thus, the performance of the bit data pathways is utilized whilst 
code density is increased for those applications not requiring the sophistication of the X-bit instruction set. 

There is a synergy in the provision of the two permanent instruction sets. The user is allowed the flexibility to alter 
the instruction set they are using to suit the circumstances of the program, with both instruction sets being efficiently 

15 implemented by the manufacturer (critical in high performance systems such as RISC processors where relative timings 
are critical, and without sacrificing the use of the N-bit data pathways. 

Another advantage of this arrangement is that since fewer bytes of program code will be run per unit time when 
operating with, the Y-brt instruction set, less stringent demands are place upon the data transfer capabilities of the 
memory systems storing one program code. This reduces complexity and cost. 

20 The invention also moves in the opposite direction to the usual trend in the field. The trend is that with each new 

generation of processors, more instructions are added to the instructions sets with the instruction sets becoming wider 
to accommodate this. In contrast, the invention starts with a wide sophisticated instruction set and then adds a further 
narrower instruction set (with less space for large numbers of instructions) for use in situations where the full scope of 
the wide instruction set is not required. 

25 it will be appreciated that the first instruction set and the second instruction set may be completely dependent. 

However, in preferred embodiments of the invention said second instruction set provides a subset of operations pro- 
vided by said first instruction set. 

Providing that the second instruction set is a sub-set of the first instruction set enables more efficient operation 
since the hardware elements of the processor core may be set out more readily to suit both instruction sets. 

30 When an instruction set of program instruction words of an increased bit length has been added to an existing 

program instruction set, it is possible to ensure that the program instruction words from the two instruction sets are 
orthogonal. However, the instruction set switch allows this constraint to be avoided and permits systems in which said 
second instruction set is non -orthogonal to said first instruction set. 

The freedom to use non-orthogonal instruction sets eases the task of the system designer and enables other 

35 aspects of the instruction set design to be more effectively handled. 

The instruction set switch could be a hardware type switch set by some manual intervention. However, in preferred 
embodiments of the invention said instruction set switch comprises means responsive to an instruction set flag, said 
instruction set flag being setable under user program control. 

Enabling the instruction set switch to be used to switch between the first instruction set and the second instruction 

40 set under software control is a considerable advantage. For example, a programmer may utilise the second instruction 
set with its Y-bit program instruction words for reasons of increased code density for the majority of a program and 
temporarily switch to the first instruction set with its X-bit program instruction words for those small portions of the 
program requiring the increased power and sophistication of the first instruction set. 

The support of two independent instruction sets may introduce additional complication into the system. In preferred 

45 embodiments of the invention said processor core comprises a program status register for storing currently applicable 
processing status data and a saved program status register, said saved program status register being utilized to store 
processing status data associated with a main program when a program exception occurs causing execution of an 
exception handling program, said instruction set flag being part of said processing status data. 

Providing the instruction set flag as part of the programming status data ensures that it is saved when an exception 

so occurs. In this way, a single exception handler can handle exceptions from both processing modes and can be allowed 
access to the saved instruction set flag within the saved program status register should this be significant in handling 
the exception. Furthermore, the exception handler can be made to use either instruction set to improve either its speed 
or code density as the design constraints require. 

In order to deal with the differing bit lengths of the different instruction sets, preferred embodiments of the invention 

55 provide that said processor core comprises a program counter register and a program counter incrementer for incre- 
menting a program counter value stored within said program counter register to point to a next program instruction 
word, said program counter incrementer applying a different increment step in said first processing mode than in said 
second processing mode. 
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It will be appreciated that the shorter program instruction words of the second instruction set cannot contain as 
much information as those of the first instruction set. In order to accommodate this it is preferred that the spaces saved 
within the second instruction set by reducing the operand range that may be specified within a program instruction word. 

In preferred embodiments of the invention said processor core is coupled to a memory system by a Y-bit data bus, 
5 such that program instruction words from said second instruction set require a single fetch cycle and program instruction 
words from said first instruction set require a plurality of fetch cycles. 

The use of a Y-bit data bus and memory system allows a Less expensive total system to be built whilst still enabling 
a single fetch cycle for each program instruction word for at least the second instruction set. 

The first decoding means and the second decoding means may be completely separate. However, in preferred 
10 embodiments of the invention said second decoding means reuses at least a part of said first decoding means. 

The re-use of at least part of the first decoding means by the second decoding means reduces the overall circuit 
area. Furthermore, since the first instruction set is generally less complicated then the second instruction set and is 
driving the same processor core, there will be a considerable amount of the second decoding means that is possible 
to re-use. 

is Viewed from another aspect the invention provides a method of processing data, said method comprising the steps 

of: 

selecting either a first processing mode or a second processing mode for a processor core having N-bit data 
pathways and being responsive to a plurality of core control signals; 
20 in said first processing mode, decoding X-bit program instruction words, specifying N-bit data processing opera- 

tions, from a first permanent instruction set to generate said core control signals to trigger processing utilizing said 
N-bit data pathways; and 

in said second processing mode, decoding Y-bit program instruction words from a second permanent instruction 
set to generate said core control signals to trigger processing, being less than X; characterised in that 
25 said Y-bit program instruction words specify N-bit data processing operations utilizing said N-bit data pathways. 

An embodiment of the invention will now be described, by way of example only, with reference to the accompanying 
drawings in which: 

30 Figure 1 schematically illustrates a data processing apparatus incorporating a processor core and a memory sys- 

tem; 

Figure 2 schematically illustrates an instruction and instruction decoder for a system having a single instruction set; 
Figure 3 illustrates an instruction pipeline and instruction decoders for use in a system having two instruction sets; 
Figure 4 illustrates the decoding of an X-bit program instruction word; 
35 Figures 5 and 6 illustrate the mapping of Y-bit program instruction words to X-bit program instruction words; 

Figure 7 illustrates an X-bit instruction set; 
Figure 8 illustrates a Y-bit instruction set; and 

Figure 9 illustrates the processing registers available to the first instruction set and the second instruction set. 

40 Figure 1 illustrates a data processing system (that is formed as part of an integrated circuit) comprising a processor 

core 2 coupled to a Y-bit memory system 4. In this case, Y is equal to 16. 

The processor core 2 includes a register bank 6, a Booths multiplier 8, a barrel shifter 10, a 32-bit arithmetic logic 

unit 1 2 and a write data register 1 4. Interposed between the processor core 2 and the memory system 4 is an instruction 

pipeline 16, an instruction decoder 18 and a read data register 20. A program counter register 22, which is part of the 
45 processor core 2, is shown addressing the memory system 4. A program counter incrementer 24 serves to increment 

the program counter value within the program counter register 22 as each instruction is executed and a new instruction 

must be fetched for the instruction pipeline 16. 

The processor core 2 incorporates N-bit data pathways (in this case 32- bit data pathways) between the various 

functional units. In operation, instructions within the instruction pipeline 16 are decoded by the instruction decoder 18 
so which produces various core control signals that are passed to the different functional elements within the processor 

core 2. In response to these core control signals, the different portions of the processor core conduct 32-bit processing 

operations, such as 32-bit multiplication, 32-bit addition and 32-bit logical operations. 

The register bank 6 includes a current programming status register 26 and a saved programming status register 

28. The current programming status register 26 holds various condition and status flags for the processor core 2. These 
ss flags may include processing mode flags (e.g. system mode, user mode, memory abort mode etc.) as well as flags 

indicating the occurrence of zero results in arithmetic operations, carries and the like. The saved programming status 

register 28 (which may be one of a banked plurality of such saved programming status registers) is used to temporarily 

store the contents of the current programming status register 26 if an exception occurs that triggers a processing mode 
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switch. In this way, exception handling can be made faster and more efficient. 

Included within the current programming status register 26 is an instruction set flag T. This instruction set flag is 
supplied to the instruction decoder 18 and the program counter incrementer 24. When this instruction set flag T is set, 
the system operates with the instructions of the second instruction set (i.e. Y-bit program instruction words, in this case 
5 16-bit program instruction words). The instruction set flag T controls the program counter incrementer 24 to adopt a 
smaller increment step when operated with the second instruction set. This is consistent with the program instruction 
words of the second instruction set being smaller and so more closely spaced within the memory locations of the 
memory system 4. 

As previously mentioned, the memory system 4 is a 16-bit memory system connected via 16-bit data buses to the 
10 read data register 20 and the instruction pipeline 1 6. Such 1 6-bit memory systems are simpler and inexpensive relative 
to higher performance 32-bit memory systems. Using such a 16-bit memory system, 16-bit program instruction words 
can be fetched in a single cycle. However, if 32-bit instructions from the second instruction set are to be used (as 
indicated by the instruction set flag T), then two instruction fetches are required to recover a single 32-bit instruction 
for the instruction pipeline 16. 

15 Once the required program instruction words have been recovered from the memory system 4, they are decoded 

by the instruction decoder 18 and initiate 32-bit processing within the processor core 2 irrespective of whether the 
instructions are 16-bit instructions or 32-bit instructions. 

The instruction decoder 18 is illustrated in Figure 1 as a single block. However, in order to deal with more than 
one instruction set, the instruction decoder 18 has a more complicated structure as will be discussed in relation to 

20 Figures 2 and 3. 

Figure 2 illustrates the instruction pipeline 16 and an instruction decoder 18 for coping with a single instruction set. 

In this case, the instruction decoder 18 includes only a first decoding means 30 that is operative to decode 32-bit 

instructions. This decoding means 30 decodes the first instruction set (the ARM instruction set) utilising a programmable 

logic array (PLA) to produce a plurality of core control signals 32 that are fed to the processor core 2. The program 
25 instruction word which is currently decoded (i.e. yields the current the core control signals 32) is also held within an 

instruction register 34. Functional elements within the processor core 2 (e.g. the Booths multiplier 8 or the register 

bank 6) read operands needed for their processing operation directly from this instruction register 34. 

A feature of the operation of such an arrangement is that the first decoding means 30 requires certain of its inputs 

(the P bits shown as solid lines emerging from the PipeC pipeline stage) early in the clock cycle in which the first 
30 decoding means operates. This is to ensure that the core control signals 32 are generated in time to drive the necessary 

elements within the processor core 2. The first decoding means 30 is a relatively large and slow programmable logic 

array structure and so such timing considerations are important. 

The design of such programmable logic array structures to perform instruction decoding is conventional within the 

art. A set of inputs are defined together with the desired outputs to be generated from those inputs. Commercially 
35 available software is then used to devise a PLA structure that will generate the specified set of outputs from the specified 

set of inputs. 

Figure 3 illustrates the system of Figure 2 modified to deal with decoding a first instruction set and a second 
instruction set. When the first instruction set is selected by the instruction set flag T, then the system operates as 
described in relation to Figure 2. When the instruction set flag T indicates that the instructions in the instruction pipeline 

40 16 are from the second instruction set, a second decoding means 36 becomes active. 

This second decoding means decodes the 16-bit instructions (the Thumb instructions) utilising a fast PLA 38 and 
a parallel slow PLA 40. The fast PLA 38 serves to map a subset (Q bits) of the bits of the 16-bit Thumb instructions to 
the P bits of the corresponding 32-bit ARM instructions that are required to drive the first decoding means 30. Since 
a relatively small number of bits are required to undergo this mapping, the fast PLA 38 can be relatively shallow and 

45 so operate quickly enough to allow the first decoding means sufficient time to generate the core control signals 32 in 
response to the contents of PipeC. The fast PLA 38 can be considered to act to "fake" the critical bits of a corresponding 
32-bit instruction for the first decoding means without spending any unnecessary time mapping the full instruction. 

However, the full 32-bit instruction is still required by the processor core 2 if it is to be able to operate without radical 
alterations and significant additional circuit elements. With the time critical mapping having been taken care of by the 

so fast PLA 38, the slow PLA 40 connected in parallel serves to map the 16-bit instruction to the corresponding 32-bit 
instruction and place this into the instruction register 34. This more complicated mapping may take place over the full 
time it takes the fast PLA 38 and the first decoding means 30 to operate. The important factor is that the 32-bit instruction 
should be present within the instruction register 34 in sufficient time for any operands to be read therefrom in response 
to the core control signals 32 acting upon the processor core 2. 

ss it will be appreciated that the overall action of the system of Figure 3 when decoding the second instruction set is 

to translate 16-bit instructions from the second instruction set to 32-bit instructions from the first instruction set as they 
progress along the instruction pipeline 16. This is rendered a practical possibility by making the second instruction set 
a subset of a first instruction set so as to ensure that there is a one to one mapping of instructions from the second 
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instructions set into instructions within the first instruction set. 

The provision of the instruction set flag T enables the second instruction set to be non-orthogonal to the first 
instruction set. This is particularly useful in circumstances where the first instruction set is an existing instruction set 
without any free bits that could be used to enable an orthogonal further instruction set to be detected and decoded. 

5 Figure 4 illustrates the decoding of a 32-bit instruction. At the top of Figure 4 successive processing clock cycles 

are illustrated in which a fetch operation, a decode operation and finally an execute operation performed. If the particular 
instruction so requires (e.g. a multiply instruction), then one or more additional execute cycles may be added. 

A 32-bit instruction 42 is composed of a plurality of different fields. The boundaries between these fields will differ 
for differing instructions as will be shown later in Figure 7. 

10 Some of the bits within the instruction 42 require decoding within a primary decode phase. These P bits are bits 

4 to 7, 20 and 22 to 27. These are the bits that are required by the first decoding means 30 and that must be "faked" 
by the fast PLA 38. These bits must be applied to the first decoding means and decoded thereby to generate appropriate 
core control signals 32 by the end of the first part of the decode cycle. Decoding of the full instruction can, if necessary 
take as long as the end of decode cycle. At the end of the decode cycle, operands within the instruction are read from 

is the instruction register 34 by the processor 2 during the execute cycle. These operands may be register specifiers, 
offsets or other variables. 

Figure 5 shows the mapping of an example of 16-bit instruction to a 32-bit instruction. The thick lines originate 
from the Q bits within the 1 6-bit instruction that require mapping into the P bits within the 32-bit instruction so that they 
may be applied to the first decoding means 30. It will be seen that the majority of these bits are either copied straight 

20 across or involve a simple mapping. The operands Rn\ Rd and Immediate within the 16-bit instruction require padding 
at their most significant end with zeros to fill the 32-bit instruction. This padding is needed as a result of the 32-bit 
instruction operands having a greater range than the 16-bit instruction operands. 

It will be seen from the generalised form of the 32-bit instruction given at the bottom of Figure 5, that the 32-bit 
instruction allows considerably more flexibility than the subset of that instruction that is represented by the 16-bit in- 

25 struction. For example, the 32-bit instructions are preceded by condition codes Cond that renders the instruction con- 
ditionally executable. In contrast, the 1 6-bit instructions do not carry any condition codes in themselves and the condition 
codes of the 32-bit instructions to which they are mapped are set to a value of "1 1 1 0" that is equivalent to the conditional 
execution state "always - . 

Figure 6 illustrates another such instruction mapping. The 16-bit instruction in this case is a different type of Load/ 
30 store instruction to that illustrated in Figure 5. However, this instruction is still a subset of the single data transfer 
instruction of the 32-bit instruction set. 

Figure 7 schematically illustrates the formats of the eleven different types of instruction for the 32-bit instruction 
set. These instructions are in turn: 



35 1 . Data processing PSR transfer; 

2. Multiply; 

3. Single data swap; 

4. Single data transfer; 

5. Undefined; 

40 6. Block data transfer; 

7. Branch; 

8. Co-processor data transfer; 

9. Co-processor data operation; and 

10. Co-processor register transfer. 
45 11. Software interrupt. 



A full description of this instruction set may be found in the Data Sheet of the ARM6 processor produced by Advanced 
RISC Machines Limited. The instruction highlighted within Figure 7 is that illustrated in Figures 5 and 6. 

Figure 8 illustrates the 16-bit instruction set that is provided in addition to the 32-bit instruction set. The instructions 
50 highlighted within this instruction set are those illustrated in Figures 5 and 6 respectively. The instructions within this 
16-bit instruction set have been chosen such that they may all be mapped to a single 32-bit instruction and so form a 
subset of the 32-bit instruction set. 

Passing in turn between each of the instructions in this instruction set, the formats specify the following: 

55 Format 1 j Op = 0,1 . Both ops set the condition code flags. 
I 0: ADD Rd, Rs, #lmmediate3 
j 1: SUB Rd, Rs, #lmmediate3 
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(continued) 



Format 2 Op = 0, 1 . Both ops set the condition code flags. 
0: ADD Rd, Rm, Rn 
1: SUB Rd, Rm, Rn 



10 



Format 3 [ 3 opcodes. Used to build large immediates. 

1 = ADD Rd, Rd, immediate 8«8 

2 = ADD Rd, Rd, immediate 8«16 

3 = ADD Rd, Rd, immediate 8«24 



15 



20 



25 



30 



35 



40 



Format 4 Op gives 3 opcodes, all operations are MOVS Rd, Rs SHIFT 
#lmmediate5, where SHIFT is 

0 is LSL 

1 is LSR 

2 is ASR 

Shifts by zero as defined on ARM. 

Format 5 0p1 *8+0p2 gives 32 ALU opcodes, Rd = Rd op Rn. All 
operations set the condition code flags. 
The operations are 

AND. OR, EOR, BIC (AND NOT), NEGATE, CMP, CMN, MUL 
TST, TEQ, MOV, MVN(NOT), LSL, LSR, ASR, ROR 
Missing ADC, SBC, MULL 

Shifts by zero and greater than 31 as defined on ARM 
8 special opcodes, LO specifies Reg 0-7, HI specifies a 
register 8-15 

SPECIAL is CPSR or SPSR 

MOV HI, LO (move hidden register to visible register) 
MOV LO, HI (move visible register to hidden register) 
MOV HI, HI (eg procedure return) 
MOVS HI, HI (eg exception return) 

MOVS HI, LO (eg interrupt return, could be SUBS, HI, HI, #4) 
MOV SPECIAL, LO (MSR) 
MOV LO, SPECIAL (MRS) 
CMP HI, HI (stack limit check) 
8 free opcodes 



45 



so 



55 



Format 6 Op gives 4 opcodes. All operations set the condition 
code flags 

0: MOV Rd,#lmmediate 8 
1: CMP Rs,#lmmediate 8 
2: ADD Rd, Rd,#lmmediate 8 

It is possible to trade ADD for ADD Rd, Rs,#lmmediate5 

Format 7 Loads a word PC + Offset (256 words, 1024 bytes). Note 
the offset must be word aligned. 
LDR Rd,[PC,#+1024] 

This instruction is used/to access the next literal pool, to load constants, addresses etc. 

Format 8 Load and Store Word from SP (r7) + 256 words (1024 
bytes) 
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(continued) 





Load and Store Byte from SP(r7) + 256 bytes 




LRD Rd,[SP,#+1024) 




LDRB Rd fSP#+2561 




These instructions are for stack and frame access. 


Format Q 


Load and Store Word (or Byte), signed 3 bit Immediate 




Offset (Post Inc/Dec), Forced Writeback 




L is Load/Store, U is Up/Down (add/subtract offset), B 




is Byte/Word 




LDR (B) Rd, [Rb],#+/-Offset3 




STR (B) Rd, [Rb],#+/-Offset3 




These instructions are intended for array access 




The offset encodes 0 - 7 for bytes and 0, 4 - 28 for words 


Format 10 


Load and Store Word (or Byte) with signed Register 




Offcot /Pro lnf^/r)A*M No writfihsrk 




L is Load/Store, U is Up/Down (add/subtract offset), B 




is Byte/Word 




1 nR Rd TPh J-Rn 1 SI #91 
LUn riLi,[PliJ, T/'nU, LOLfrtJ 








1 nRR RH TRh -u/.Rnl 
LUnD nU| [nu, +/ nuj 




STRR Rd fRh 4./-R0I 




THasa instructions ara intandad for hasa ■+■ off sat nointar access and comhinad with tha B-bit MOV 




ADR filJR ni\/A fa irk/ nuick immadiatA offsAt access 


Format 11 


Load and Store Word (or Byte) with signed 5 bit 




Immediate Offset (Pre Inc/Dec) No Writeback 




L is Load/Store B is Bvte/Word 




LDRfBl Rd TRb #+Offset51 




SIR(B) Rd, [Rb,#+Offset5] 




Thoco inetrnr^tionQ am intpndpd for QtrupfiirA accass 

1 1 Icot) II loll UUUUI lo CIIC7 IllldlUOU IUI OUUOlUltJ a^UCOO 




ThA offsAt ancodas 0 - 31 for bvtAS and 0 4-1 24 for words 

l l lo ui i ooi ci (uuuoo v w i iwi uy ioo cii hj , •+ i iui vwi uo 


Format 1 0 


1 oad and fitora Miiltinlo /ForrAd WritAhack^ 

L_vJClU CU IU OIUI u IVIlilllUlw ^liJIUOU V V 1 UUL/ClOrX^ 




LDMIARb' (Rlist) 




STMIA Rb!, (Rlist) 




Rlist specify registers r0-r7 




A sub-class of these instructions are a pair of subroutine call and return instructions. 




For LDM if r7 is the base and bit 7 is set in rlist, the 




PC is loaded 




For STM if r7 is the base and bit 7 is set in rlist, the 




1 R is storad 




II r/ io UboU do ii it? uobo luyibioi, o|j io uouu iiioiuctu 




In hrvth racoc a Full DAQnAndinn Stark iq imnlAnriAntfld ia 
in uuu i udotJo a nun ucouci ilju ly o iau r\ io 111 ipioi i idi iicvj i o 




LDM is like ARM'S LDMFD, STM is like ARM's STMFD 




So for block copy, use r7 as the end pointer 




If r7 is not the base, LDM and STM is like ARMs LDMIA, STMIA 


Format 13 


Load address. This instruction adds an 8 bit unsigned constant to either the PC or the stack pointer 




and stores the results in the destination register. 




ADD Rd, sp, + 256 bytes 




ADD Rd, pc, + 256 words (1024 bytes) 
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(continued) 



Format 1 4 

Format 15 
Format 16 



The SP bit indicates if the SP or the PC is the source. 

If SP is the source, and r7 is specified as the destination register, SP is used as the destination register. 

Conditional branch, +/- 128 bytes, where cond defines the condition code (as on ARM) cond = 15 
encodes as SWI (only 256, should be plenty). 

Sets bits 22:12 of a long branch and link. MOV 1r, #offset « 12. 

Performs a long branch and link. Operation is SUB newlr, pc. #4; ORR pc, oldlr, #offset «1. newlr 
and oldlr mean the 1r register before and after the operation. 



1S As previously mentioned, the 1 6-bit instruction set has reduced operand ranges compared to the 32-bit instruction 

set. Commensurate with this, the 16-bit instruction set uses a subset of the registers 6 (see Figure 1 ) that are provided 
for the full 32-bit instruction set. Figure 9 illustrates the subset of registers that are used by the 16-bit instruction set. 



20 Claims 



1 . Apparatus for processing data, said apparatus comprising: 



a processor core (2) having N-bit data pathways and being responsive to a plurality of core control signals (32); 
first decoding means (30) for decoding X-bit program instruction words, specifying N-bit data processing op- 
erations, from a first permanent instruction set to generate said core control signals to trigger processing 
utilizing said N-biT data pathways; 

second decoding means (36) for decoding Y-bit program instruction words from a second permanent instruction 
set to generate said core control signals to trigger processing, Y being less than X; and 
an instruction set switch for selecting either a first processing mode using said first decoding means upon 
received program instruction words or a second processing mode using said second decoding means upon 
received program instruction words; characterised in that 

said Y-bit program instruction words specify N-bit data processing operations utilizing said N-bit data pathways. 

2. Apparatus as claimed in claim 1 , wherein said second instruction set provides a subset of operations provided by 
said first instruction set. 



3. Apparatus as claimed in any one of claims 1 and 2, wherein said second instruction set is non-orthogonal to said 
first instruction set. 

40 

4. Apparatus as claimed in any one of claims 1 , 2 and 3, wherein said instruction set switch comprises means re- 
sponsive to an instruction set flag (T), said instruction set flag being setable under user program control. 

5. Apparatus as claimed in claim 4, wherein said processor core comprises a program status register (CPSR) for 
45 storing currently applicable processing status data and a saved program status register (SPSR), said saved pro- 
gram status register being utilized to store processing status data associated with a main program when a program 
exception occurs causing execution of an exception handling program, said instruction set flag (T) being part of 
said processing status data. 



6. Apparatus as claimed in any one of the preceding claims, wherein said processor core comprises a program 
counter register (22) and a program counter incrementer (24) for incrementing a program counter value stored 
within said program counter register to point to a next program instruction word, said program counter incrementer 
applying a different increment step in said first processing mode than in said second processing mode. 

7. Apparatus as claimed in any one of the preceding claims, wherein at least one program instruction word within 
said second instruction set has a reduced operand range compared to a corresponding program instruction word 
within said first instruction set. 
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8. Apparatus as claimed in any one of the preceding claims, wherein said processor core is coupled to a memory 
system (4) by a Y-bit data bus, such that program instruction words from said second instruction set require a 
single fetch cycle and program instruction words from said first instruction set require a plurality of fetch cycles. 

5 9. Apparatus as claimed on any one of the preceding claims, wherein said second decoding means reuses at least 
a part of said first decoding means. 

10. Apparatus as claimed in any one of the preceding claims, wherein said apparatus is an integrated circuit. 

10 11. A method of processing data, said method comprising the steps of: 

selecting either a first processing mode or a second processing mode for a processor core (2) having N-bit 
data pathways and being responsive to a plurality of core control signals; 

in said first processing mode, decoding X-bit program instruction words, specifying N-bit data processing op- 
15 erations, from a first permanent instruction set to generate said core control signals to trigger processing 

utilizing said N-bit data pathways; and 

in said second processing mode, decoding Y-bit program instruction words from a second permanent instruc- 
tion set to generate said core control signals to trigger processing, Y being less than X; characterised in that 
said Y-bit program instruction words specify N-bit data processing operations utilizing said N-bit data pathways. 

20 

Patentanspruche 

1 . Vorrichtung zur Datenverarbeitung, wobei die Vorrichtung aufweist: 

25 

einen Prozessorkern (2), der N-Bit Datenwege hat und auf eine Mehrzahl von Kernsteuersignalen (32) an- 
spricht, 

eine erste Dekodiereinrichtung (30), zum Dekodieren von Programmanweisungsworten mit X-Bit, welche die 
N-Bit Datenverarbeitungsvorgange spezifizieren, und zwar aus einem ersten dauerhaften Befehlssatz, urn die 
30 Kernsteuersignale zu erzeugen, urn eine Verarbeitung unter Verwendung der N-Bit Datenwege auszulosen, 

eine zweite Dekodiereinrichtung (36), zum Dekodieren von Programmbef ehlsworten mit Y-Bit aus einem zwei- 
ten dauerhaften Befehlssatz, urn die Kernsteuersignale zu erzeugen, urn eine Verarbeitung auszulosen, wobei 
Y kleiner als X ist, und 

einen Befehlssatzschalter zum Auswahlen entweder einer ersten Verarbeitungsweise unter Verwendung der 
35 ersten Dekodiereinrichtung aufgrund von empfangenen Programmbef ehlsworten, Oder einer zweiten Verar- 

beitungsweise unter Verwendung der zweiten Dekodiereinrichtung aufgrund empfangener Programmbefehls- 
worte, dadurch gokonnzeichnet, daft 

die Programmbefehlsworte mit Y-Bit N-Bit Datenverarbeitungsvorgange unter Verwendung der N-Bit Daten- 
wege spezifizieren. 

40 

2. Vorrichtung nach Anspruch 1, wobei der zweite Befehlssatz einen Teilsatz von Vorgangen bereitstellt, die durch 
den ersten Befehlssatz be re it g est e lit werden. 

3. Vorrichtung nach einem der Anspruche 1 oder 2, wobei der zweite Befehlssatz nicht orthogonal zu dem ersten 
45 Befehlssatz ist. 

4. Vorrichtung nach einem der Anspruche 1 , 2 oder 3, wobei der Befehlssatzschalter Einrichtungen aufweist, die auf 
eine Befehlssatzanzeige (T) ansprechen, wobei die Befehlssatzanzeige unter der Programmsteuerung des Be- 
nutzers gesetzt werden kann. 

so 

5. Vorrichtung nach Anspruch 4, wobei der Prozessorkern ein Prog ramm status register (CPSR) zum Speichern ge- 
rade anwendbarer Verarbeitungszustandsdaten aufweist, sowie ein Statusregister (SPSR) fur ein aufbewahrtes 
Programm aufweist, wobei das Statusregister fur das aufbewahrte Programm verwendet wird, urn Verarbeitung- 
zustandsdaten zu speichern, die zu einem Hauptprogramm gehoren, wenn eine Programmausnahme auftritt, wel- 

55 che die Ausfuhrung eines Ausnahmehandhabungsprogramms bewirkt, wobei die Befehlsanzeige (T) ein Teil der 

Verarbeitungzustandsdaten ist. 

6. Vorrichtung nach einem der vorstehenden AnsprOche, wobei der Prozessorkern ein Programmzahlregister (22) 
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und einen Programmzahlerheraufsetzer (24) zum Heraufsetzen eines Programmzahlerwertes aufweist, welcher 
in dem Programmzahlerregister gespeichert ist, urn auf ein nachstes Programmbefehlswort zu weisen, wobei der 
Programmzahlerheraufsetzer in dem ersten Verarbeitungszustand urn eine andere Schrittweite heraufsetzt als in 
dem zweiten Verarbeitungszustand. 

5 

7. Vorrichtung nach einem der vorstehenden Anspruche, wobei zumindest ein Programmbefehlswort in dem zweiten 
Befehlssatz im Vergleich zu einem entsprechenden Programmbefehlswort in dem ersten Befehlssatz einen ver- 
minderten Operandenbereich hat. 

10 8. Vorrichtung nach einem der vorstehenden Anspruche, wobei der Prozessorkern mit einem Speichersystem (4) 
uber einen Y-Bit Datenbus verbunden ist, so da3 Programmbefehlsworte aus dem zweiten Befehlssatz nur einen 
einzigen Taktzyklus erfordern und Programmbefehlsworte aus dem ersten Befehlssatz eine Mehrzahl von Taktzy- 
klen erfordern. 

is 9. Vorrichtung nach einem der vorstehenden Anspruche, wobei die zweiten Dekodiereinrichtungen zumindest einen 
Teil der ersten Dekodiereinrichtungen wiederverwenden. 

10. Vorrichtung nach einem der vorstehenden Anspruche, wobei die Vorrichtung ein integrierter Schaltkreis ist. 

20 11 . Verfahren zur Verarbeitung von Daten, wobei das Verfahren die Schritte aufweist: 

Auswahlen entweder eines ersten Verarbeitungszustandes Oder eines zweiten Verarbeitungszustandes fur 
einen Prozessorkern (2) , der N-Bit Datenwege hat und auf eine Mehrzahl von Kernsteuersignalen anspricht, 
wobei in dem ersten Verarbeitungszustand X-Bit Programmbefehlsworte, durch welche N-Bit Datenverarbei- 
25 tungsvorgange spezifiziert werden, aus einem ersten dauerhaflen Befehlssatz dekodiert werden, um die 

Kernsteuersignale zu erzeugen, um die Verarbeitung unter Verwendung der N-Bit Datenwege auszulosen, 
und wobei 

in dem zweiten Verarbeitungszustand Y-Bit Programmbefehlsworte von einem zweiten dauerhaften Befehls- 
satz dekodiert werden, um die Kernsteuersignale zu erzeugen, um eine Verarbeitung auszulosen, wobei Y 
30 kleiner als X ist, dadurch gekennzeichnet, daB 

die Y-Bit Programmbefehlsworte N-Bit Datenverarbeitungsvorgange unter Verwendung der N-Bit Datenwege 
spezifizieren. 



35 Revendications 

1. Dispositif destine a traiter des donn6es, ledit dispositif comprenant : 

un coeur de processeur (2) comportant des trajets de donnees a N binaires et etant sensible a une plurality 

40 de signaux (32) de commande de coeur ; 

un premier moyen (30) de dScodage destine a d6coder, a partir d'un premier ensemble permanent d'instruc- 
tions, des mots a X binaires destruction de programme, specifiant des operations de traitement de donnees 
a N binaires, pour produire lesdits signaux de commande de coeur pour d6clencher un traitement utilisant 
lesdits trajets de donnees a N binaires ; 

45 un second moyen (36) de decodage destine a decoder, a partir d'un second ensemble permanent destruc- 

tions, des mots a Y binaires destruction de programme pour produire lesdits signaux de commande de coeur 
pour declencher un traitement, Y 6tant plus petit que X ; et 

un sdlecteur d'ensemble destructions destine a s6lectionner, soit un premier mode de traitement utilisant ledit 
premier moyen de d6codage sur des mots destruction de programme recus, soit un second mode de traite- 
so ment utilisant ledit second moyen de decodage sur des mots destruction de programme recus ; caractSrise" 

en ce que : 

lesdits mots a Y binaires destruction de programme specifient des operations de traitement de donnees a N 
binaires utilisant lesdits trajets de donn6es a N binaires. 

55 2. Dispositif selon la revendication 1, dans lequel ledit second ensemble destructions fournit un sous-ensemble 
d'operations fournies par ledit ensemble destructions. 

3. Dispositif selon Tune quelconque des revendications 1 et 2, dans lequel ledit second ensemble destructions est 
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non orthogonal audit premier ensemble d'instructions. 

4. Dispositif selon Tune quelconque des revendications 1 , 2 et 3, dans lequel ledit selecteur d'ensemble d'instructions 
comprend un moyen sensible a un indicateur (T) d'ensemble d'instructions, ledit indicateur d'ensemble d'instruc- 

5 tions pouvant §tre active sous les ordres d'un programme d'utilisateur. 

5. Dispositif selon la revendication 4, dans lequel ledit coeur de processeur comprend un registre d'etat de programme 
(CPSR) destine a m6moriser une donn6e d'etat de traitement en cours duplication et un registre d'etat de pro- 
gramme sauvegarde (SPSR), ledit registre d'etat de programme sauvegarde servant a memoriser une donn6e 

10 d'etat de traitement associe a un programme principal lorsqu'il se produit une anomalie de programme provoquant 
PexScution d'un programme de traitement d'anomalie, ledit indicateur (T) d'ensemble d'instructions faisant partie 
de ladite donnde d'etat de traitement. 

6. Dispositif selon I'une quelconque des revendications pr^cedentes, dans lequel ledit coeur de processeur comprend 
15 un registre (22) de compteur de programme et un incrementateur (24) de compteur de programme destine a 

incrSmenter une valeur de compteur de programme memorised a Pint6rieur dudit registre de compteur de pro- 
gramme pour pointer vers un prochain mot d'instruction de programme, ledit incrementateur de compteur de pro- 
gramme appliquant dans ledit premier mode de traitement un pas d'incr6ment different de celui applique dans ledit 
second mode de traitement. 

20 

7. Dispositif selon I'une quelconque des revendications precedentes, dans lequel au moins un mot d'instruction de 
programme a I'int6rieur dudit second ensemble d'instructions a une longueur d'op6rande reduite par comparaison 
avec un mot d'instruction de programme correspondant a l'int6rieur dudit premier ensemble d'instructions. 

25 8. Dispositif selon I'une quelconque des revendications pr6c6dentes, dans lequel ledit coeur de processeur est couple 
a un systeme (4) de m6moire par un bus de donnees a Y binaires, de sorte que des mots d'instruction de programme 
issus dudit second ensemble d'instructions n6cessitent un seul cycle de lecture et que des mots d'instructions de 
programme issus dudit premier ensemble destructions necessitent une plurality de cycles de lecture. 

30 9. Dispositif selon I'une quelconque des revendications pr6cedentes, dans lequel ledit second moyen de decodage 
reutilise au moins une partie dudit premier moyen de decodage. 

10. Dispositif selon I'une quelconque des revendications pr6c£dentes, dans lequel ledit dispositif est un circuit integre\ 

35 11. Proc6d6 de traitement de donnees, ledit precede" comprenant les Stapes : 

de selection, soit d'un premier mode de traitement, soit d'un second mode de traitement pour un coeur de 
processeur (2) comportant des trajets de donn6es a N binaires et etont sensible a une plurality de signaux 
de commande de coeur ; 

40 dans ledit premier mode de traitement, de dScodage, a partir d'un premier ensemble permanent d'instructions, 

de mots a X binaires d'instruction de programme, specifiant des operations de traitement de donnees a N 
binaires, pour produire lesdits signaux de commande de coeur pour d6clencher un traitement utilisant lesdits 
trajets de donnees a N binaires ; et 

dans ledit second mode de traitement, de d6codage, a partir d'un second ensemble permanent d'instructions, 
45 de mots a Y binaires d'instruction de programme, pour produire lesdits signaux de commande de coeur pour 

d£clencher un traitement, Y etant plus petit que X ; caract£ris£ en ce que : 

lesdits mots a Y binaires d'instruction de programme sp6cifient des operations de traitement de donnees a N 
binaires utilisant lesdits trajets de donn6es a N binaires. 

so 
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